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Skew-symmetric densities recently received much attention in the literature, giving rise to in- 
creasingly general families of univariate and multivariate skewed densities. Most of those families, 
however, suffer from the inferential drawback of a potentially singular Fisher information in the 
vicinity of symmetry. All existing results indicate that Gaussian densities (possibly after re- 
striction to some linear subspace) play a special and somewhat intriguing role in that context. 
We dispel that widespread opinion by providing a full characterization, in a general multivari- 
ate context, of the information singularity phenomenon, highlighting its relation to a possible 
link between symmetric kernels and skewing functions - a link that can be interpreted as the 
mismatch of two densities. 
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1. Introduction 

Models for skewed distributions have become increasingly popular in recent years, as they 
provide a much better fit for data presenting some departure from normality, and from 
symmetry in general. Many of the proposed models in the literature allow for a continuous 
variation from symmetry to asymmetry, regulated by some finite-dimensional parameter. 

The success of those skewed distributions started with the seminal papers by Azzalini 
[3, 4] introducing the scalar skew-normal model, which embeds the univariate normal 
distributions into a flexible parametric class of (possibly) skewed distributions. More 
formally, a random variable X is said to be skew-normal with location parameter /1 £ R, 
scale parameter er £ R^ and skewness parameter 8 £ R if it admits the probability density 
function (p.d.f.) 

x^2a~ 1 (/)(a~ 1 (x- fi))<5(8a^ 1 (x - //}), x£R, (1.1) 
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where 4> and $ respectively denote the p.d.f. and cumulative distribution function (c.d.f.) 
of a standard normal distribution. Besides their many appealing features, however, skew- 
normal densities unfortunately also suffer from an unpleasant inferential drawback: in 
the vicinity of symmetry, that is, at 6 = 0, the Fisher information matrix for the three- 
parameter density (1.1) is singular - typically, with rank 2 instead of 3. Consequently, 
skew-normal distributions happen to be problematic from an inferential point of view, 
since that singularity violates the assumptions for standard Gaussian asymptotics and 
precludes, at first sight, any nontrivial test of the null hypothesis of symmetry. Such 
a situation has been studied by Rotnitzky et al. [21], who show that one of the param- 
eters then cannot be estimated at the usual root-ra rate, while the limit distribution of 
maximum likelihood estimators might be bimodal. 

This Fisher singularity problem, however, did not hamper the success of skew-normal 
densities among practitioners, while theoretical extensions were developing into various 
directions. Azzalini and Dalla Valle [8] and Azzalini and Capitanio [6] consider multi- 
variate skew- normal distributions resulting from replacing in (1.1) the univariate normal 
kernel <p with its fc-variate version In the same paper, Azzalini and Capitanio also 
propose substituting an elliptical kernel /& for the normal one and replacing the 
skewing factor $ in (1.1) with an arbitrary, possibly non-Gaussian, univariate symmetric 
c.d.f. G\. The resulting distributions are called skew- elliptical. The class of skew-elliptical 
distributions is also studied in detail by Branco and Dey [10], based, however, on a slightly 
different definition. Genton and Loperfido [15] introduce a concept of generalized skew- 
elliptical distributions encompassing all previous ones, where arbitrary skewing functions 
(not necessarily c.d.f.s, but satisfying c.d.f.-type conditions) can be used in conjunction 
with the elliptical kernel fk- Finally, Azzalini and Capitanio [7] (who also propose the 
nowadays commonly adopted definition of multivariate skew-t distributions), then Wang 
et al. [24] arc relaxing the assumption of elliptically symmetric kernels into a weaker 
assumption of central symmetry, leading to multivariate skew- symmetric densities of the 
form 

x^/n( x ) = /n ( X ) 

(1.2) 

:= 2|£r 1 / 2 /(£- 1/2 (x - M ))n(S- 1 / 2 (x - /x), S), x G R k , 

where 

(a) /j, £ R k is a location parameter, £ £ Sk (throughout, |M| denotes the determinant 
and M 1 / 2 the symmetric square-root of any M in the class Sk of symmetric positive 
definite k x k matrices) a scatter matrix, while S £ R k plays the role of a skewness 
parameter; 

(b) the symmetric kernel f is a centrally symmetric nonvanishing p.d.f., meaning that 
0^/(-z) = /(z), z£R k , and 

(c) the skewing function U : R k x R k ->• [0, 1] satisfies n(-z, S) + IT(z, S) = 1, z, S £ R k , 
and n(z,0) = l/2, z,£R k . 

This definition is the one we are adopting in the sequel. While IT(z, S), in most practical 
situations, is of the simple form n(<5'z), with II :M— > [0,1], Wang et al. [24] actually do 



Skew-symmetric distributions and Fisher information - a tale of two densities 



3 



not consider any specific ^-parameterization. Our parametric approach (with the regular- 
ity assumptions (A2)-(A2 + ) and (B2)~(B2 + ) of Sections 2.1 and 3.1, resp.) is in the spirit 
of - if not at the same level of mathematical generality as - the diffcrentiablc path and 
tangent space approach taken in the local and asymptotic treatment of scmiparamctric 
models (see, e.g., Chapter 25 of van der Vaart [23]). Also, the condition that / is a non- 
vanishing density is not imposed by Wang et al. [24]; we are adding that requirement in 
order to avoid inessential complications related with bounded and parameter-dependent 
supports. For further information about skew-symmetric models and related topics, we 
refer the reader to the recent monograph by Gcnton [14] , and to the review papers Arnold 
and Beaver [2] and Azzalini [5]. 

The issue of singular Fisher information runs like a red thread through all those de- 
velopments. Mentioned, from the very beginning, in Azzalini [3] itself, it is discussed, 
in the univariate and multivariate skew-normal context, by Azzalini and Capitanio [6], 
Pewsey [19], Chiogna [11] and Arellano- Valle and Azzalini [1]. The same issue has been 
considered in various subclasses of skew-symmetric distributions. Pewsey [20] and Azza- 
lini and Gcnton [9] establish that the singularity problem remains after replacement of 
the c.d.f. 4> in (1.1) with any c.d.f. H satisfying mild regularity assumptions. DiCiccio 
and Monti [12] prove that, within the class of univariate skew-exponential power distri- 
butions of Azzalini [4] , the normal kernels are the only ones suffering from singular Fisher 
information. The same result is shown to hold true for two classes of scalar skew-i distri- 
butions by Gomez et al. [16] and DiCiccio and Monti [13]. The multivariate counterparts 
of these statements are provided in Ley and Paindavcine [17, 18], respectively. 

Finally, the very general (still a special case of (1.2), though) class of multivariate 
skew-symmetric densities of the form 

x^2|S|- 1 / 2 /(S" 1/2 (x-/x))n(J'S- 1/2 (x-/x)), xeM fc , (1.3) 

encompassing all previous cases, is considered in Ley and Paindaveine [17], who charac- 
terize, for each possible value 1 < m < k of the Fisher information rank deficiency, the 
form of the symmetric kernels giving rise to such deficiency. Here again, Gaussian ker- 
nels are playing a very special role. In the univariate setup and within the subclass of 
multivariate generalized skew-elliptical distributions, only the skew-normal densities are 
affected by the singularity problem. Although results in the fully general (for densities 
of the form (1.3)) multivariate case are more complex, only kernels exhibiting Gaus- 
sian restrictions on some m-dimensional linear subspaces can lead to degenerate Fisher 
information. 

A tentative remedy to that singularity problem was suggested by Azzalini himself 
who, as early as 1985, in his original paper, proposes a reparametrization of skew-normal 
families, the so-called centered parametrization, under which Fisher information matrices 
remain full-rank. The multivariate version of that reparametrization is examined in detail 
by Arellano- Valle and Azzalini [1]. That solution, however, never really caught up in 
practice, partly because the structure of the skewing mechanism, hence of the resulting 
skew- normal family, under the new parametrization, loses much of its simplicity (certainly 
so in the multivariate context), partly because of its limitation to skew-normal families. 
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Azzalini and Genton [9] therefore once again emphasize the need for a clarification of the 
Fisher singularity phenomenon in order to "remove, or at least alleviate, the necessity of 
an alternative parametrization." 

The objective of the present paper is to provide such a clarification. While all comments 
and existing results, in this singular Fisher information issue, seemed to be pointing at 
some special status for normal kernels and, consequently, skew-normal distributions, we 
completely dispel the idea of any particular role of Gaussian kernels. Turning to the 
fully general class of skew-symmetric densities described in (1.2), we show indeed that 
information deficiency actually originates in an unfortunate mismatch between / and II 
- more specifically, between two densities, the kernel / and an exponential density gjj 
associated with the skewing function II. 

A tale of two densities, thus, rather than a Gaussian mystery. . . 

The paper is organized as follows. Section 2.1 deals with the univariate setup, where 
the singularity problem is simple, as the rank of the three-parameter Fisher information 
matrix only can be 3 or 2. The result is derived in an informal way, and some examples 
of skewing functions are treated in Section 2.2. A more formal statement of the general 
solution is provided for the multivariate setup in Section 3.1, along with some examples 
in Section 3.2. Final comments and conclusions are given in Section 4. 

2. The univariate setup 
2.1. A tale of two densities . . . 

We start by analyzing the information singularity problem in the univariate case. To do 
so, consider the class of skew-symmetric probability density families of the form 

x^f$(x) = f* (Tt5 (x):=2a- 1 f(a- 1 (x- f i))IL(a- 1 (x-n),6), xeR, (2.1) 

with i9 := ((1,(7,8)', where \i £ R is a location parameter, a £ Rq~ a scale parameter and 
5 £ R an asymmetry parameter. 

The symmetric kernel /:R— > R + in (2.1) is a nonvanishing symmetric standardized 
p.d.f., that is, a probability density function such that f(z) = f(—z) ^ for all z £ R, 
with scale parameter one - an identification constraint for a that does not imply any loss 
of generality. Classical standardization, with a constraint of the form z 2 f(z) dz = 1, 
involves the variance of Z with p.d.f. /; the scale parameter a 2 then is the mean squared 
deviation E[(X — /i) 2 ] with respect to fj, of X with p.d.f. f^ a0 - If moment assumptions 
are to be avoided, one may rather consider, for instance, medians of squares, with an 
identification constraint of the form f(z) dz = 0.75: if X has p.d.f. , a then is 
the median of the absolute deviation \X — fj,\, which exists irrespective of the density 
of X . Other quantiles of \X — fi\ would enjoy similar properties. We throughout assume 
that such an identification constraint, hence a concept of scale, has been adopted. That 
choice, however, is completely arbitrary, and any element in the scale family of p.d.f.'s 
of the form (2.1) with (j, = 6 = could be chosen as the reference density characterizing 
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unit scale - hence could serve as a symmetric kernel for the same skew-symmetric family. 
As we shall see, that choice has no impact on the results of this paper. 

The second factor in (2.1) is a skewing function, namely, a function II:RxR->[0,l] 
such that U(-z,6) + IL(z,8) = 1 for all z,ifel, and II(z,0) = 1/2 for all z e R. 
Traditional choices involve U(z,8) = $>(8z) (skew-normal distributions, Azzalini [3]), 
U(z,S) = $(<5sign(2)|z| Q / 2 (2/a) 1 / 2 ) (skew-exponential power distributions, Azzalini [4]) 
or Tl(z,8) = G(dz) for any symmetric univariate c.d.f. G (skew-symmetric distributions, 
Azzalini and Capitanio [6]). The class of skewing functions considered here is much 
broader. 

The regularity assumptions we are making on / and II are as follows. 

Assumption (Al). The mapping z i— > f(z) is differentiable, with derivative f such that, 
letting iff := —f/f, the information quantity for location a~ 2 If , with 



Assumption (Al + ). Same as ( Al), but the information quantity for scale a 2 Jf, with 



moreover is finite. 

Assumption (A2). (i) The mapping z i— >■ II(z, 8) is differentiable, and its derivative 
equals at 8 — 0. (ii) The mapping 8 t— > H(z,8) is differentiable at 8 = for all zGt, 
with derivative (at 8 = 0) dsH{z,8)\s=o ='■ "0( z ) such that z <— > ip( z ) admits a primitive, 
denoted as . 

Assumption (A2 + ). Same as (A2), but the quantity 



moreover is finite. 

These assumptions essentially guarantee the existence and fmitcncss of Fisher infor- 
mation at <5 = 0; the differentiability and integrability conditions could be relaxed into 
weaker differentiability properties such as quadratic mean differentiability. This small 
gain of generality, however, would require a generalized definition of information (in the 
Le Cam style), with non- negligible technical complications. For the sake of simplicity, 
we stick to a more traditional approach and the traditional definition of Fisher informa- 
tion. Note that this definition differs from the one, used by some authors, of an observed 




is finite. 
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Fisher information, that is, the empirical value of the matrix of negative second-order 
derivatives of the log-likelihood evaluated at the maximum likelihood estimator of the 
parameters. 

Under Assumptions (Al) and (A2), the score vector £f- } &, at (n,a, 0)' =: i?0i takes the 
form 

W*) :=gradtflog/n(.T)k =: (£)^ (x),^ (x)J s f .^(x))' 

o-"V/(o- _1 (a:-M)) 
cr-^cr-^x - ^)ip f {o-- 1 {x - fi)) - 1) 
2^(o- 1 (x-[i)) 

where the factor 2 in £^.^ follows from the fact that II(z,0) = 1/2 for all z <G R. Assump- 
tion (A2)(i) is a mild requirement which, in regular models, readily follows from the fact 
that n(z,0) = 1/2, and ensures that the skewing function IT plays no role in the score 
functions for [i and a at 6 = 0. 

Under Assumptions (Al + ) and (A2 + ), the 3x3 Fisher information matrix for (/i,er, <5) 
exists, and takes the form 



r /; o :=<T 1 / lf& {x)l'f.o {x)f(p 1 (x-fi))dx = 

J — OO 

with 

/oo 
i> 2 (z)f(z)dz 
-oo 

and 

/oo 
<p f (zMz)f(z)dz. 
- oo 

The zeroes in T f-^ are easily obtained by noting that ( 1 f.^ and £ 3 f.^ are antisymmetric 
functions of (x — /i), whereas Zf.& is symmetric with respect to the same quantity. 

It then trivially follows that singularity of Tf-^ only can be due to the singularity of 
the 2x2 submatrix 








7/;*, 










<Y 13 





^,33 



^13 33 



the existence of which, however, only requires Assumptions (Al) and (A2 + ). Clearly, 
either r°. 1?o is full-rank or, in case 7/ 1 ^ 7/ 3 ^ = (7/ 3 ^ ) 2 j it has rank 1. 

Now, the Cauchy-Schwarz inequality implies that (7/ 3 ^ ) 2 < 7/ 1 tf 7/ 3 i9 > wrt h equality 
if and only if 

ipf = aip f-a.s. (equivalently, Lebesgue-a.e.) (2-2) 

for some constant a e R. It thus follows that r°. 1?Q is singular for any i?o = (MjC, 0)' 
if and only if (2.2) is satisfied for some a S R. This holds under Assumptions (Al) 
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and (A2 + ). If Assumption (Al) is reinforced into (Al + ), the 2x2 singularity of r°. l9o in 
turn is equivalent to the 3x3 singularity of Tf ; & . Replacing ipf with its definition, the 
necessary and sufficient condition (ff = atp yields a first-order differential equation whose 
solutions are of the form f(x) = cexp(— a^(x)) for some a £ R, where "J is a primitive 
of ip and c £E K + an integration constant. 

Summing up, let the couple (/, II) satisfy Assumptions (Al + ) and (A2 + ): Tf-^ is 
singular for all #o if and only if the symmetric kernel / belongs to the exponential family 

£y := {g a := exp(-a*)y/ J exp(-a*(z)) dz a E A^ (2.3) 

with minimal sufficient statistic 4", natural parameter —a, and natural parameter space 

A := la £ R such that J exp(— a^(z j) dz < oo 1. 

The same statement can be made under Assumptions (Al) and (A2 + ) about the singu- 
larity of 1%,. 

Note that A, as the natural parameter space of an exponential family, is an open 
interval of R. The unique value an of a E A such that / and g an coincide, if any, is 
entirely determined by the standardization constraint on /. If the classical variance- 
based standardization is adopted, then an is solution of the equation 

/oo poo 
z 2 exp(-a*(z))dz= / cxp(-a*(z)) dz. 
-oo J — oo 

If standardization is imposed via medians of squares, an is solution of 

* 1 POO 

exp(— a^b(z)) dz = 3 / cxp(— a^(z)) dz. 



Letting f a {x) := er -1 f(x/a), a E Rq , also note that if and only if f a € 

£^, oa -i, where £^, oa -i stands for the exponential family with minimal sufficient statis- 
tic ^>oa~ l : z i — y ^(a~ 1 z). It is easy to see that both conditions moreover determine the 
same an, which confirms that the arbitrary choice of a scale parameter has no impact 
on the result. 

As a consequence of those results, it follows that, for any symmetric density / satis- 
fying Assumption (Al + ) (resp., Assumption (Al)), there exists a skewing function II/ 
(infinitely many of them, actually) such that T f-^ (resp., T°j.^ o ) exists and is singular for 
any i?o; among them, with a n = y/2rt, II/(z, S) := <&(5ipf(z)), for which Assumption (A2 + ) 
holds. 

The converse is slightly more subtle. Let II be a skewing function satisfying Assump- 
tion (A2); a function ^ with derivative tjj thus exists, which automatically satisfies 
ty(z) = \E'(— z). If there exists a density g a in the corresponding exponential family (2.3) 
such that f^° oo il> 2 (z)g a (z)dz is finite, then the skew-symmetric family with symmetric 
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kernel / = g a and skewing function IT is such that Assumptions (Al) and (A2 + ) hold, 
and the corresponding 2x2 matrix r°.^ o exists and is singular for any #o- If moreover 
/ = g a also satisfies Assumption (Al + ), then the 3x3 information matrix Tf^ exists, 
and is singular for any i?o- Note, however, that the reference density for scale - the one 
that, by definition, provides the unit scale - here is / = g a . 

A talc of two densities, / and g aiI , is emerging, which demythifies the seemingly singular 
role of the Gaussian distribution. 

This treatment of the univariate case provides a good intuition for the more complex 
fc-dimensional problem where, as we shall see, the rank of the Fisher information matrix 
can take any value between k + k(k + l)/2 = k(k + 3)/2 and 2fc + fc(fc+l)/2 = fc(fc + 5)/2. 
Since the univariate case follows as a particular case by letting k = 1 in the general result 
of Proposition 3.1 of the next section, we do not provide a more formal statement here. 

2.2. Some examples 

In order to illustrate the results of the previous section, we now apply our findings in 
three examples of skewing functions and determine the exponential family with corre- 
sponding minimal sufficient statistic and natural parameter space leading to singular 
Fisher information matrices. 

As a first example, we propose the most usual class of skewing functions, namely those 
of the form Hi(z, 8) := H(8z), where II : R -> [0,1] is a function satisfying H(—y) + II(y) = 
1 for all y £ R (hence 11(0) = 1/2) and such that 11(0) := dH(y) / dy\ y= o exists and differs 
from 0. Clearly, any univariate c.d.f. could be used, in which case we retrieve the skew- 
symmetric distributions of Azzalini and Capitanio [6], and, for / = <fi and II = <!>, the 
skew- normal distributions of Azzalini [3]. For more examples of skewed distributions of 
this type, we refer the reader to Gomez et al. [16]. Straightforward calculations show that 
tpi(z) = II(0)z, and hence the minimal sufficient statistic characterizing the exponential 
family (2.3) is ^i(z) = Il(0)z 2 /2. The resulting exponential family thus is nothing 
but the family of centered normal densities of the form 

gW{z) = exp(- a fl(0)z 2 /2)(27t/(an(0)))- 1/2 , 

with natural parameter space A\ := sign(IT(0))R^. Assumptions (Al + ) and (A2 + ) arc 
satisfied, hence the 3x3 matrix Tf-^ exists. Thus, whenever the traditional skewing 
function ITi is used, Gaussian kernels are the only problematic ones regarding singular 
Fisher information at 8 = 0. This result, combined with the popularity of ITi as a skewing 
function, explains the long-standing belief in a particular role of the Gaussian distribu- 
tion. Note that our findings are in line with earlier ones by Gomez et al. [16], who show 
that, by combining a Student kernel with v degrees of freedom and a skewing function of 
the form III , Fisher information at 8 = is non-singular in general but becomes singular 
as v — > oo. And, more generally, our results are in total accordance with those of Ley and 
Paindaveine [17] for the total class of skew-symmetric distributions of this kind. 

Next consider the class of skewing functions U.2(z, 8) := n((5sign(z)|z| a/ ' 2 (2/a) 1 / 2 ) 
with a > 1 and y i-> IT(y) satisfying the usual conditions. Clearly, for a = 2, II2 co- 
incides with IIi. This second type of skewing function was used, with II = $, by 
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Azzalini [4] to define skew-exponential power distributions. One immediately obtains 
ip2{z) = n(0) sign(z)|z| Q / 2 (2/a) 1 / 2 , and, consequently, 

* 2 (z) = tl(0)\z\ a / 2+1 (2/a) 1 / 2 (a/2 + 

The corresponding exponential family £^ 2 contains all densities of the form 

5 i 2 )(z)=cexp(-ail(0)(2/ a ) 1 /2( a /2 + l)-i|z|-/2+i), 

where c is a normalization constant and a again ranges over either the positive or the 
negative real half line, depending on the sign of 11(0). One easily can check that the 
complete Fisher information matrix is well-defined in this case. DiCiccio and Monti [12] 
prove that, for a^2, skew-exponential power distributions do not suffer from singular 
Fisher information matrices in the vicinity of symmetry. Our findings do not only confirm 
that result, but also provide some further insight into the reasons for that absence of 

(2) 

singularity. Actually, the exponent of \z\ in ga has to be a/2 + 1, while the symmetric 
kernels in skew-exponential power distributions as defined in Azzalini [4] are of the form 
cexp(— |z| Q /o!). Thus, while skew-normal distributions involve a symmetric kernel and 
a skewing function which are in a problematic relationship, this is avoided with the class 
of skew-exponential power distributions. 

As a final example, consider skewing functions of the form 113(2, S) := II(<5sin(z)), 
with II belonging to the same class of functions as in the two preceding examples. It is 
easy to check that II3 then actually is a skewing function satisfying Assumption (A2 + ). 
Direct manipulations yield V'3( z ) = n(0)sin(z) and ^3(2) = — 11(0) cos(z). The natural 
parameter space .A3 of the exponential family £q, 3 corresponding to the minimal sufficient 
statistic ^3 is empty. In other words, no symmetric kernel / yields a reduced Fisher 
information matrix when the skewing function II3 is adopted. Figure 1 shows some of 
the skewed densities obtained by combining 1I3 (for II = $) with a standard normal 
kernel. Comparison with the original skew-normal distributions of Azzalini [3] indicates 
that the new family, which is immune from degenerate Fisher information problems, is 
nevertheless extremely close to Azzalini's classical one. 



o.s r 0.8 1- 




Figure 1. Plots of the original Azzalini [3] skew-normal density 2cj)(x)<&(8x) (left) and the 
Il3-based version 2(f>(x)$>(Ssxn(x)) (right), for 6=0 (darker), 0.5, 2, and 6 (lighter). 
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3. The multivariate setup 
3.1. A further tale . . . 

Before starting our investigation of the multivariate case, let us introduce some further 
notations required when passing from dimension 1 to k > 1. For any given k x k matrix M, 
we denote by vec(M) the fc 2 -vector obtained by stacking the columns of M on top of 
each other, and by vech(M) the k(k + l)/2-subvector of vec(M) for which only upper 
diagonal entries in M are considered. We write P& for the k(k + l)/2 x k 2 matrix such 
that P' fc (vcchM) = vec(M) for any symmetric M and 1^ for the k x k identity matrix. 

The general multivariate skew-symmetric densities (generalizing (2.1)) we are consid- 
ering are of the form (1.2), with / and II satisfying the general conditions (a)-(c). 
The symmetric kernel / moreover is supposed to have identity scatter matrix 1^ , which 
provides the required identification constraint for X. 

As in the univariate setup, we need to impose some mild regularity assumptions on / 
and II. 

Assumption (Bl). The mapping zh/(z) is differentiable, with gradient f such that, 
letting cp ^ := — ///, the kx k information matrix for location Tl" 1 ^ 2 ! fYT 1 ! 2 , with 

I f := I ¥> / (z)^(z)/(z)dz, 

JR k 

is finite and invertible. 

Assumption (Bl + ). Same as (Bl), but the fc(fc + l)/2 x k(k+l)/2 information matrix 
for scatter (actually, for E 1 / 2 , or more precisely, for vech(S 1 ^ 2 ), as £ 1/2 is symmetric ) 

p fc (5r 1/2 <g> i k ) j-/(sr 1/2 ® ikjp'fc, with 

Jf.= / vec(zip' f (z) -Ifc)(vec(zy>f(z) - I fc ))'/(z) dz, 

JR k 

moreover is finite and invertible. 

Assumption (B2). (i) The mapping z h- ^H(z,S) is differentiable, and has gradient 
at S = 0. (ii) The mapping S i— > H(z,S) is differentiable at S = for all z £ M. k , with 
gradient (at S = ) grad^ II(z, <5) | a = o =: VK 2 ) such that xp admits a primitive ^ , that is, 
a real-valued function z i— > ^(z) such that grad z ^(z) = xp(z). 

Assumption (B2 + ). Same as (B2), but the k x k matrix 

f T/>(z)V>'(z)/( Z )dz 

jR k 

moreover is finite and invertible. 
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These assumptions admit the same interpretation as in the univariate case, and ba- 
sically ensure the existence of a finite Fisher information matrix. The standardization 
issue also calls for the same comments as in Section 2.1. The interpretation of the scatter 
matrix S is related to the choice of a standardization constraint on /. If wc impose that Z 
with p.d.f. / has unit covariance matrix, then £ = J Rfc (x — /x)(x — m)'/^,s,o( x ) dx- How- 
ever, concepts of scatter that make sense irrespective of the underlying density also can 
be used in this multivariate setup, such as the celebrated Tyler matrix Vxyier (Tyler [22]), 
defined as the unique symmetric positive definite matrix V with tr V = k satisfying 

e[(x - M )(x - M )7((x - m)'v- 1 (x - n))) = k-'v. 

Note however that the Tyler matrix Vxyicr in fact is a shape matrix, not a scatter matrix: 
the corresponding scatter is S = erVryior, with a = k^ 1 tr(S). As in the univariate case, 
the scatter X, for the kernel /, safely and without any loss of generality, can be fixed to 
identity for identification purposes, implying that, for /, a takes value 1, while Viyicr 
is an identity matrix. As in the univariate case, this choice has no impact on the final 
results. 

Here also, wc could relax classical differentiability conditions by considering weaker 
differentiability and generalized Fisher information concepts, at the expense, however, of 
non-negligible technical complications. 

Under Assumptions (Bl) and (B2), the score vector •£/;■#, at i?o := (//,', vcch(S 1 / 2 )', 0')', 
takes the form 

£ /;l?0 (x) := grad*log/£(x)|« =: (^(x) £^ (x) ^ (x))' 

= P fc (S- 1 / 2 ®I fc )vec(S- 1 / 2 (x-/xV / (5]- 1 / 2 (x- M ))-I fc ) , 
V 2^(S- 1 / 2 (x-/x)) J 

where <£> stands for the standard Kronecker product. Note that, for k = 1, this score 
vector coincides with the one we obtained in Section 2.1. Under Assumptions (Bl + ) 
and (B2 + ), the corresponding Fisher information matrix 



f € /;1?0 (x)4 ;1?o (x)/(S- 1 / 2 (x- At ))dx 

JR k 



exists and is finite, and naturally partitions into 

T 

r /;tfo 
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r 13 v 







° 


13/ 

/;#o 








with 

r^ =4/ t/,(z)t/>'(z)/(z)dz and I*?*,, = 2E- 1 / 3 / ^ / (z)^'(z)/(z) dz. 

JR k JR k 
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As in the univariate case, the blocks of zeroes in Tf-$ a readily follow from symmetry 
arguments and, without loss of generality, we can focus our attention on the submatrix 

/rill T^i3 

r ._ 1 f;«o 1 /;19 
1 /i*o -~ I F 13/ F 33 

the existence of which only requires Assumptions (Bl) and (B2 + ). In the univariate case, 
the 2x2 matrix r°.^ o was either full-rank or singular with rank 1; here, the 2k x 2k 

matrix T j.q can be singular with any rank ranging from k to 2k — 1 (note that the lower 
bound k is a direct consequence of either Assumption (Bl) or (B2 + )). 

The following proposition fully characterizes, for each possible rank 2k — m, m G 
{1, ...,k}, the relation between the kernel / and the skewing function II causing such 
degeneracy (for simplicity, we restrict to a characterization of the singularity of T®.^). 

Proposition 3.1. Let the symmetric kernel f and the skewing function H satisfy As- 
sumptions and (B2 + ). The following statements are equivalent: 

(i) the 2k x 2k matrix T®.0 O is singular with rank 2k — to, 1 < to < k, for any i?o/ 

(ii) denoting by Z a random k-vector with p.d.f. f , there exists a k x k orthogonal 
matrix O' = (0' 1; O2), where 0[ and O f 2 are k x to- and k x (k — to) -dimensional, 
respectively, such that, letting Y := OZ and y := Oz, for Lebesgue- almost all 
O2Z — (y m +i, . . . , yk)' € R fc-In , the density of OiZ = (Y±, . . , ,Y m )' conditional on 
O2Z = {Y m+ \ , . . . , Yj;)' = (y m+ i, . . . ,yk)' belongs to the exponential family 



(3-1) 



i->9a(yi, ■■■,y m ) -=C 1 exp(-a*(0'y)) 
a such that C = C(y m+ x, . . .,y k ) := / exp(~a*(0'y)) dy±- ■ ■ dy m < 00 L 

with parameter a and minimal sufficient statistic \&(0'(Yi, . . . , Y m , y m +i, • • • , Uk)')- 
Note that the natural parameter space 



A = A{y m+1 ,...,y k ) 



|a G R such that y cxp(— a^(O'y)) dyi ■ • ■ dy m < 00 1 



of the exponential family (3.1) in principle also depends on (j/ m +i, . . . , ?/&). Natural pa- 
rameters in exponential families being well identified, the values an(j/ m +i, • ■ • , 2/fc) of 
the natural parameter a achieving, whenever condition (ii) of Proposition 3.1 holds, 
the matchings / = g a , are uniquely defined for Lebesgue-almost all (k — m)-tuple 
(Vm+i,- ■ -,yk), yielding exponential densities 9aa{y m +u-,Vk)- 

Proposition 3.1 has the following straightforward corollary. 

Corollary 3.1. (i) Let f be a symmetric kernel satisfying Assumption (Bij: there exists 
a skewing function Hf such that the rank ofT < j.^ o reaches its minimal value k for any i?o- 
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(ii) Let II be a skewing function satisfying Assumption (B2) with ^ such that, for 
some an, 

(iia) z i— > g au (z) := C _1 cxp(— an'I'(z)) is a p.d.f. with identity scatter matrix, and 

(iib) f„ k ip(z)ip (z)/(z)dz is finite and invertible (meaning that (B2 + ) is satisfied). 

Then, there exists a symmetric kernel fji such that the rank ofT < j n .^ Q reaches its minimal 
value k for any #o- 

Proof of Proposition 3.1. Clearly, r^. 1?o has rank 2k — m, 1 < m < k, if and only if m 
is the largest integer such that there exist (k x to) matrices V and W with (V, — W) 
of rank to such that 

V'<p f = W> Lcbcsgue-a.e. (3.2) 

(note that the matrix E -1 / 2 is incorporated in V, and hence plays no role in the 
characterization (3.2)). Both V and W arc of maximal rank to. Suppose indeed 
that V is not: then, there exists =/= X e W n such that VA = 0, so that A'W^ = 
XV'tpf = (Lebesgue-a.e.). Then, in view of Assumption (B2), WA = as well, hence 
A'(V',— W) = 0, which contradicts the assumption that (V,— W) has rank to. The 
same reasoning holds for W. It follows that V, without loss of generality, can be as- 
sumed to be orthonormal, and therefore can be extended into an orthogonal matrix 
O' := (V, v), v being the k x (k — to) orthogonal complement to V. The necessary and 
sufficient condition (3.2) then takes the form 

[0</>j]i... m = W'^ Lebesgue-a.e. (3.3) 

where [0<py]i... m stands for Oy^'s to first rows. 

Define Y := OZ. Since Z has density /, Y has density y n- / Y (y) = f(O'y). This 
density / Y has gradient / Y and score <p ^ , with 

V/v(y) := -/ Y (y)// Y (y) = -0/(0'y)//(0'y) = V/ (0'y). 
This, combined with (3.3), yields 

[V/*-(y)]i...m = W / '0(O'y) Lebesgue-a.e. 

or, more explicitly, 

/d Vl log/ Y (y)\ 

: = -WV(O'y) Lebcsguc-a.e. (3.4) 

V^ m log/ Y (y)/ 

As a function of (j/i, . . . ,y m ), the left-hand side in (3.4) has primitive 
log/ Y (yi, • • • ,y m , y m +i, ■■■,Vk) + c(y m+1 ,. . . , y k ), 

where the "integration constant" c is an arbitrary function of (y m +i, . . . , yk)- The right- 
hand side therefore has the same primitive, still up to an additive c(j/ m +i, . . . ,yk)- Now, 
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partitioning O' into (0' 1; O2) where Oi and 0' 2 are k x m and k x (k — m), respectively, 
a necessary condition for 

(Vl, ■ ■ ■ ,Vm) ^ W'V(C>i(yi, . . . , y m )' + 0' 2 (y m+1 ,, . . , y k )') 

to be the gradient of a scalar function is W = aOi for some a = a(y m +i , yk) £ R: in 
view of Assumption (B2), a primitive of 

(yi, . . .,y m ) i-> aOiV>(Oi(2/i, . . . , y m )' + 0' 2 (y m+1 ,. . .,y k )') 

is then a^O'^yi, . . . , y m )' + O^j/m+i, • • • , Vk)'), up to the usual additive constant - here, 
an arbitrary function of (j/ m +i, . . . , yk)- The necessary and sufficient condition (3.4) thus 
takes the further form 

/ Y (y) = exp(-c(y m+ i, . . . ,y k )) exp(-a*(0' 1 (yi, . . .,y m )' + 0' 2 (y m+1 ,. . . , y k )')) 

for some a = a{y m +i, • • • > 2/fc) S M; in other words, the conditional density of (Yi , . . . , Y m )' 
given (Y m+1 ,...,Y k )' = (y m+1 , . . . ,y k )' is 

= f^(yi,---,ym,ym+i,---,yk)/ / / Y (yi,...,y m ,?/ m +i,---,yfc)d?/i--- dy m (3.5) 

= C(y m+ i, . ..,y k ) exp(-a*(Oi(yi, . . . , y m )' + O^s/m+i, . . . , Vk)')), 

where C^ 1 {y m+1 ,. ..,y k ) := J Rm exp(-a*(Oi(yi, . . . , y m )' + 0' 2 (y m+1 , . . .,y k )')) dyi • • • dy m> 
for some a = a(y m+ i,...,yk) eK. 

Summing up, there exists an orthogonal matrix O' = (0^,0^) such that, for any 
(j/ TO +i, . . . ,yk)' € R fe_m , the density of OiZ =: (Yi, . . . ,Y TO )' conditional on 2 Z = 
(y m +i, . . . ,y k )' belongs to the exponential family with minimal sufficient statistic 

tf (OifYi, . . . , Y m )' + Oi(y m+ i, . . . , Vk )'), 

as was to be proved. □ 

So far, we have formally solved the singularity problem for the 2k x 2k information 
matrix ri^ . As in the univariate case, the singularity problem for the full kik + 5)/2 x 
k(k + 5)/2 information matrix Tf-_^ a is slightly different. Indeed, the existence of Tf-^ 
requires the stronger Assumption (Bl + ), as the information for scatter, which is not 
present in F®.^ , has to exist as well; this adds a further condition on the exponential 
family in Proposition 3.1. Nevertheless, there is no fundamental difference between the 
two setups: it only could happen that a solution to the singularity problem of Tj.^ is not 
a solution of the larger problem because the matrix Tf-^ simply does not exist, hence 
cannot be singular. This explains why, for the sake of simplicity, we state the results of 
this section in terms of n.^ . The message is clear: the tale of two densities has turned 
into a more elaborate plot, starring a much larger number of actors. 
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3.2. Further examples 

As in the univariate case, we now analyze three concrete examples of skewing functions in 
the light of the findings of the previous section, which provides the theoretical statement 
in Proposition 3.1 with some further intuition. 

The first example is the natural extension of the univariate skewing function III to the 
multivariate context, with II ^ (z, 8) :=H(S'z), where II:M— >• [0,1] satisfies exactly the 

(k) 

same conditions as in Section 2.2. The resulting class of skewing functions TL\ is the 
most common one in the literature. A skewing function II = $ combined with a multi- 
normal kernel / = <pk yields the class of skew-multinormal densities of Azzalini and Dalla 
Valle [8]. When / is only required to be spherically symmetric and the skewing func- 
tion II is a univariate symmetric c.d.f., we obtain the class of skew-elliptical distributions 
as defined by Azzalini and Capitanio [6] , itself a subclass of the generalized skew-elliptical 
distributions of Gcnton and Lopcrfido [15] where II is left unspecified. Finally, relaxing 
the assumption of spherical symmetry into the weaker assumption of central symmetry, 
we retrieve the popular class of multivariate skew-symmetric distributions analyzed in 
Ley and Paindaveinc [17]. 

Direct calculation yields ip[ k \z) = IT(0)z, hence, writing z = (z'^z^)' with Zi € R m 
and Z2 £ M. k ~ m , m = l,...,k, we obtain minimal sufficient statistics of the form 

*[ fe) (O'(Z' 1 ,z' 2 )')=IT(0)(Z' 1 Z 1 /2 + z' 2 z 2 /2) 

for a k x k orthogonal matrix decomposing into O' = (O'^O^). Quite nicely, the pos- 
sibility of separating the vectors Zi and z 2 in (0'(Z' 1 ,z 2 )') allows us to express 
the corresponding exponential densities in terms of Zi only, yielding the TO-dimcnsional 
Gaussian densities 

zi i-> cxp(-afl(0)z' 1 zi/2)(27T/(an(0))) _m/2 . 

As in the univariate case, the sign of a is the same as that of 11(0). Degenerate infor- 
mation thus takes place iff, for some adequate rotation OZ of Z ~ /, the TO-dimcnsional 
marginal distribution of [OZ]i... m is standard m-variatc normal. Note that this does not 
imply fc-variate normal distributions. Consider, for example, a random fc-vector whose 
first to components are i.i.d. standard Gaussian, and independent of the remaining k — to 
ones, themselves i.i.d. with some other standardized univariate symmetric distribution. 
In such a case, the conditional distribution of the to first components given the k — to 
last ones belongs to the exponential family of distributions just described. Thus, con- 
trary to the univariate setup, multinomial densities are not the only symmetric kernels 

(k) 

leading to singular Fisher information when combined with the skewing functions !!{ . 
Multinomial kernels, however, are the only ones for which Fisher information has mini- 
mal rank (corresponding to to = k). All this is in total accordance with earlier findings 
by Ley and Paindaveinc [17], who examine in detail the singularity issues related to 
skew-symmetric distributions generated via II ^ . We therefore refer the reader to that 

(k) 

reference for more details about the skewing functions 11^ , especially so for the special 
case of skew-elliptical distributions. 
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Our second example corresponds to another classical type of skewing functions, namely 
r4 fc) (z,<5) :=Il(8'z(v + k) 1/2 (z'z + v)- 1/2 ), (3.6) 

where II satisfies the same properties as above, and v > 0. Clearly, as v — > oo, the skewing 
functions IIj tend to skewing functions of the II^ type just considered. When II 
in (3.6) corresponds to the c.d.f. T±(-, v + k) of a Student variable with v + k degrees of 
freedom, and the symmetric kernel used is a /Vdimensional t variable with v degrees of 
freedom, then we obtain the celebrated multivariate skew-t distributions of Azzalini and 
Capitanio [7] - up to some minor details, since their non-standardized skewing functions 
are of the form 

Wu- 1 ^ - n)(u + fc) : / 2 ((x - m)'s- x (x - /*) + v y 1 ' 2 - v + k), 

with ui = diag(Sn, . . . , Xfcfc) 1 / 2 . Elementary calculation yields 
^i k) (z) = II(0)z(j/ + k) 1/2 (z'z, x ^-Va 



hence minimal sufficient statistics and exponential densities of the form 

*Sj fc) = ll(0)(i/ + fc) 1 / 2 (z'z + v) 1 ' 2 

and 

exp(-ail(0)(i/ + fc) 1 / 2 (z'z + v) 1 ' 2 ) 



(3.7) 

' / exp(-atL(0){v + A:) 1 / 2 (z'z + v) 1 ' 2 ) dzi ■ • • dz„ 

Jg.™ 



respectively. Here again, the sign of a is determined by the sign of 11(0). Azzalini and 
Genton [9] conjecture that, as long as v is finite, multivariate skew-i distributions should 
be free of singularity problems. DiCiccio and Monti [13] prove the conjecture in the 
univariate case, Ley and Paindaveinc [18] in any dimension k. Proposition 3.1 confirms 
those earlier results, as (3.7), whatever the value of a, cannot be derived from a k- 
dimensional t distribution with v degrees of freedom. Actually, letting X = (X'^X^)' 
follow a fc-variate t distribution where Xi and X2, respectively, are m- and (k — Tri- 
dimensional random vectors, it can be shown that the density of X1JX2 = X2 cannot be 
of the form (3.7). 

We conclude this section with a possible extension of the singularity-free univari- 
ate skewing function II3 of Section 2.2. Consider II3 (z, 5) := 11(5' Sin(z)), with II de- 
fined as above and Sin(z) := (sin(zi), . . . , sin(zfe))'. Checking the validity of Assump- 
tion (B2 + ) is immediate, and one also directly obtains that t/>3 (z) =II(0)Sin(z) and 
= — II(0)(cos(zi) + • • • + cos(zfc)). The same reasoning as for II3 readily yields that 
the natural parameter space related to the exponential family with minimal sufficient 
statistic is empty, hence skewing functions of the type II^ can be used without 
worrying about possibly singular Fisher information. 
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4. Final comments 

In this paper, we fully dispel the widespread opinion that Gaussian densities, in the con- 
text of skew-symmetric distributions, constitute an intriguing worst-case situation, being 
the only ones (possibly, after restriction to linear subspaces) leading to degenerate Fisher 
information matrices in the vicinity of symmetry. Our main result provides a complete 
characterization of that information degeneracy phenomenon, which generalizes and ex- 
tends all previous results of that type, and highlights the link between the symmetric 
kernel and the skewing function causing singularity. We also show how that link, in the 
univariate as well as in the multivariate case, can be described as a mismatch between 
two densities, in which the Gaussian distribution plays no particular role. By avoiding 
such mismatch, one can deal with skew-symmetric distributions without worrying about 
singular Fisher information and its consequences. 
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